Graph Neural Networks (GNN)


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Graph



  • abstract relations, topology, or connectivity
  • Graphs $G(V,E)$
    • $V$: a set of vertices (nodes)
    • $E$: a set of edges (links, relations)
    • weight (edge property)
      • distance in a road network
      • strength of connection in a personal network
  • Graphs model any situation where you have objects and pairwise relations (symmetirc or asymmetirc) between the objects
Vertex Edge
People like each other undirected
People is the boss of directed
Tasks cannot be processed at the same time undirected
Computers have a direct network connection undirected
Airports planes flies between them directed
City can travel between them directed

1.1. Type of Graph

Undirected Graph vs. Directed Graph

  • Undirected graph
    • Edges of undirected graph points both ways between nodes
    • ex) Two-way road
  • Directed graph
    • A graph in which the edges are directed
    • ex) One-way raod

Weighted Graph

  • A graph with edges assigned costs or weights
  • Also called 'Network'
    • ex) connection between cities, length of road, circuit element capacity, communication network usage fee, etc.

1.2. Graph Representation

Graph and Adjacency Matrix

  • Simple undirected graph consist of only nodes and edges
  • Graph can be represented as adjacency matrix $A$
    • Adjacency matrix $A$ indicates adjacent nodes for each node
  • Need (number of nodes) $ \times$ (number of nodes) shape matirx to represent adjacency matirx of undirected graph
    • Symmetirc matirx



1.3. Adjacent Matrix

  • Undirected graph $G = (V,E)$



$$ \begin{align*}V &= \{1,2,\cdots,7\} \\ E &= \{\{1,2\},\{1,6\},\{2,3\},\{3,4\},\{3,6\},\{3,7\},\{4,7\},\{5,6\} \} \end{align*} $$


$$\text{Adjacency list} = \begin{cases} \;\; \text{adj}(1) = \{2,6\}\\ \;\; \text{adj}(2) = \{1,3\}\\ \;\; \text{adj}(3) = \{2,4,6,7\}\\ \;\; \text{adj}(4) = \{3,7\}\\ \;\; \text{adj}(5) = \{6\}\\ \;\; \text{adj}(6) = \{1,3,5\}\\ \;\; \text{adj}(7) = \{3,4\} \end{cases}$$


$$ \text{Adjacency matrix (symmetric) } A = \begin{bmatrix} 0&1&0&0&0&1&0\\ 1&0&1&0&0&0&0\\ 0&1&0&1&0&1&1\\ 0&0&1&0&0&0&1\\ 0&0&0&0&0&1&0\\ 1&0&1&0&1&0&0\\ 0&0&1&1&0&0&0\\ \end{bmatrix}$$
  • Directed graph $G = (V,E)$



$$ \begin{align*} V &= \{1,2,\cdots,7\} \\ E &= \{\{1,2\},\{1,6\},\{2,3\},\{3,4\},\{3,7\},\{4,7\},\{6,3\},\{6,5\} \} \end{align*} $$


$$\text{Adjacency list} = \begin{cases} \;\; \text{adj}(1) &= \{2,6\}\\ \;\; \text{adj}(2) &= \{3\}\\ \;\; \text{adj}(3) &= \{4,7\}\\ \;\; \text{adj}(4) &= \{7\}\\ \;\; \text{adj}(5) &= \phi\\ \;\; \text{adj}(6) &= \{3,5\}\\ \;\; \text{adj}(7) &= \phi \end{cases}$$


$$ \text{Adjacency matrix (symmetric) } A = \begin{bmatrix} 0&1&0&0&0&1&0\\ 0&0&1&0&0&0&0\\ 0&0&0&1&0&0&1\\ 0&0&0&0&0&0&1\\ 0&0&0&0&0&0&0\\ 0&0&1&0&1&0&0\\ 0&0&0&0&0&0&0\\ \end{bmatrix}$$
In [7]:
# !pip install networkx
Collecting networkx
  Downloading networkx-2.6.3-py3-none-any.whl (1.9 MB)
Installing collected packages: networkx
Successfully installed networkx-2.6.3
In [1]:
import networkx as nx
import matplotlib.pyplot as plt

%matplotlib inline
Graph.add_edge
In [2]:
g = nx.Graph()
g.add_edge('a', 'b')
g.add_edge('b', 'c')
g.add_edge('a', 'c')
g.add_edge('c', 'd')
In [3]:
# draw a graph with nodes and edges

nx.draw(g)
plt.show()
In [4]:
# draw a graph with node labels 

pos = nx.spring_layout(g)

nx.draw_networkx_nodes(g, pos) 
nx.draw_networkx_edges(g, pos) 
nx.draw_networkx_labels(g, pos)

plt.axis('off')
plt.show()
Graph.add_nodes_from
Graph.add_edges_from
In [5]:
G = nx.Graph()

G.add_nodes_from([1,2,3,4])
G.add_edges_from([(1,2),(1,3),(2,3),(3,4)])  

# plot a graph 
pos = nx.spring_layout(G)

nx.draw(G, pos, node_size = 500)
nx.draw_networkx_labels(G, pos, font_size = 10)
plt.show()
In [6]:
print(nx.number_of_nodes(G))
print(nx.number_of_edges(G))
4
4
In [7]:
G.nodes()
Out[7]:
NodeView((1, 2, 3, 4))
In [8]:
G.edges()
Out[8]:
EdgeView([(1, 2), (1, 3), (2, 3), (3, 4)])
In [9]:
A = nx.adjacency_matrix(G)

print(A)
print(A.todense())
  (0, 1)	1
  (0, 2)	1
  (1, 0)	1
  (1, 2)	1
  (2, 0)	1
  (2, 1)	1
  (2, 3)	1
  (3, 2)	1
[[0 1 1 0]
 [1 0 1 0]
 [1 1 0 1]
 [0 0 1 0]]

1.4. Degree

Degree of Undirected Graph

  • the degree of vertex in a graph is the number of edges connected to it
  • denote the degree of vertex $i$ by $d_{i}$
  • for an undirected graph of $n$ vertices


$$ d_i = \sum_{j=1}^{n} \; A_{ij} $$

  • Degree matrix $D$ of adjacent matrix $A$


$$D = \text{diag}\{d_1, d_2, \cdots \}$$

  • Example





$$A = \begin{bmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \end{bmatrix} \qquad \Rightarrow \qquad D = \begin{bmatrix} 3 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 2 \end{bmatrix} $$

1.5. Self-connecting Edges





$$A = \begin{bmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \end{bmatrix} \qquad \Rightarrow \qquad A+I = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 1 \\ 1 & 0 & 1 & 1 \end{bmatrix} \qquad \Rightarrow \qquad \tilde D = \begin{bmatrix} 4 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix} $$

1.6. Neighborhood Normalization

Some nodes have many edges, but some don't

  • Adding $I$ is to add self-connecting edges

  • Considering neighboring nodes in the normalized weights

  • to prevent numerical instabilities and vanishing/exploding gradients in order for the model to converge

1) (First attempt) Normalized $\tilde A$

$$\tilde D^{-1}(A+I)$$

2) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1/2}(A+I) \tilde D^{-1/2}$$

Now we consider a feature matrix $H$

  • Weighted sum (or averaging) of neighboring features
$$\tilde A H = \left(\tilde D^{-1/2}(A+I)\tilde D^{-1/2}\right) H$$









In [10]:
import numpy as np
In [11]:
A = np.array([[0,1,1,1],
              [1,0,0,0],
              [1,0,0,1],
              [1,0,1,0]])

A_self = A + np.eye(4)

print(A_self)
[[1. 1. 1. 1.]
 [1. 1. 0. 0.]
 [1. 0. 1. 1.]
 [1. 0. 1. 1.]]
In [12]:
D = np.array(A_self.sum(1)).flatten()
D = np.diag(D)

print(D)
[[4. 0. 0. 0.]
 [0. 2. 0. 0.]
 [0. 0. 3. 0.]
 [0. 0. 0. 3.]]



1) (First attempt) Normalized $\tilde A$

$$\tilde D^{-1}(A+I)$$
  • It is not symmetric.
In [13]:
A_norm = np.linalg.inv(D).dot(A_self)

print(A_norm)
[[0.25       0.25       0.25       0.25      ]
 [0.5        0.5        0.         0.        ]
 [0.33333333 0.         0.33333333 0.33333333]
 [0.33333333 0.         0.33333333 0.33333333]]



2) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1/2}(A+I) \tilde D^{-1/2}$$
  • Now it is symmetric.

  • (Skip the details)

In [14]:
D = np.array(A_self.sum(1))

D_half_norm = np.power(D, -0.5).flatten()
D_half_norm = np.diag(D_half_norm)

A_self = np.asmatrix(A_self)
D_half_norm = np.asmatrix(D_half_norm)

A_half_norm = D_half_norm*A_self*D_half_norm

print(A_half_norm)
[[0.25       0.35355339 0.28867513 0.28867513]
 [0.35355339 0.5        0.         0.        ]
 [0.28867513 0.         0.33333333 0.33333333]
 [0.28867513 0.         0.33333333 0.33333333]]

2. Graph Convolution Network (GCN)

2.1. Convolution

  • In previous CNN lecture CNN has two characteristics, preserving the spatial structure and weight sharing
  • To apply convolution in graph network, graph also has to conpensate that characteristics too

Convolution Layer

  • In CNN, convolution layer preserve the spatial structure of input
  • It convolve over all spatial locations
    • Extract features for each convolution layer



Weight Sharing

  • Reduce the number of parameters by weight sharing
  • Within the same layer, the same filter will be used throughout image



2.2. Connection between CNN and GCN

  • GCNs perform similar operations where the model learns the features by inspecting neighboring nodes
  • The major difference between CNNs and GCNs is that CNNs are specially built to operate on regular (Euclidean) structured data, while GCNs operate for the graph data that the number of nodes connections vary and the nodes are unordered (irregular on non-Euclidean structured data)




2.3. Basics of GCN

  • Similar to CNN, GCN updates each node with their adjacent nodes
  • Unlike CNN, each node of GCN has different number of adjacent nodes
    • Indicate adjacent nodes of each node by adjacency matrix $A$
  • Basic process (or terminology) of GCN
    • Message: information passed by neighboring nodes to the central node
    • Aggregate: collect information from neighboring nodes
    • Update: embedding update by combining information from neighboring nodes and from itself






$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right)\\ \end{align*} $$



1) Message Aggregation from Local Neighborhood


$$ \begin{align*} &\text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right)\\\\ &\Rightarrow AH^{(k)} \end{align*} $$





2) Update


Adding a non-linear function: $k^{\text{th}}$ layer

$$ \begin{align*} H^{(k+1)} &= f \left( A, H^{(k)} \right) \\ & = \sigma \left( A H^{(k)} \, W \right) \end{align*} $$



$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right)\\\\ H^{(k+1)} &= \sigma \left(A H^{(k)} \, W_{\text{neigh}}^{(k)} \right) \end{align*} $$


  • $h_1^{(k)}$: feature matirx of first node in $k^{th}$ layer
  • $W^{(k)}$: weight of $k^{th}$ layer
    • Weight sharing: share same weight for each layer
      • In the same layer, each node is updated similarly, so it shares the same weight
      • weight sharing enchance computing complexity and time





2.4. Further Improvements for GCN

1) Message Passing with Self-Loops

  • As a simplification of the neural message passing approach, it is common to add self-loops to the input graph and omit the explicit update step


$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( h_{u}^{(k)}, \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right) \\ &= \text{UPDATE} \left( \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \cup \{u \}\right\} \right) \right) \\ \\ H^{(k+1)} &= \sigma \left( \left(A+I \right)H^{(k)} \, W^{(k)}\right) \end{align*} $$






2) Neighborhood Normalization

  • The most basic neighborhood aggregation operation simply takes the sum of the neighbor embedding.
  • One issue with this approach is that it can be unstable and highly sensitive to node degrees.
  • One solution to this problem is to simply normalize the aggregation operation based upon the degrees of the nodes involved.
  • The simplest approach is to just take a weighted average rather than sum.


$$ \begin{align*} \tilde A &= D^{-1/2}AD^{-1/2} + I \\ & \approx \tilde D^{-1/2}(A+I) \tilde D^{-1/2} \qquad \text{where } \, \tilde D \, \text{ is the degree matrix of } A+I \end{align*} $$






Finally Graph Convolutional Networks


$$ \begin{align*} H^{(k+1)} &= \sigma \left(A H^{(k)} \, W^{(k)} \right) \\\\ &\Downarrow \\\\ H^{(k+1)} &= \sigma \left( \left(A+I \right)H^{(k)} \, W^{(k)}\right) \\\\ &\Downarrow \\\\ H^{(k+1)} &= \sigma \left( \left(\tilde D^{-1/2}(A+I)\tilde D^{-1/2} \right)H^{(k)} \, W^{(k)}\right)\\\\\\ \therefore H^{(k+1)} &= \sigma \left( \tilde A H^{(k)} \, W^{(k)}\right) \end{align*} $$


  • For each layer, the feature matrix and weight matrix are multiplied to create the next feature matrix




In [15]:
import networkx as nx
import matplotlib.pyplot as plt

%matplotlib inline

G = nx.Graph()

G.add_nodes_from([1, 2, 3, 4, 5, 6])
G.add_edges_from([(1, 2), (1, 3), (2, 3), (1, 4), (4, 5), (4, 6), (5, 6)])

nx.draw(G, with_labels = True)
plt.show()
In [16]:
A = nx.adjacency_matrix(G).todense()

print(A)
[[0 1 1 1 0 0]
 [1 0 1 0 0 0]
 [1 1 0 0 0 0]
 [1 0 0 0 1 1]
 [0 0 0 1 0 1]
 [0 0 0 1 1 0]]

Assign feature vector H, so that it can be separated into two groups

In [17]:
H = np.matrix([1,0,0,-1,0,0]).T

print(H)
[[ 1]
 [ 0]
 [ 0]
 [-1]
 [ 0]
 [ 0]]

Product of Adjacency Matrix and Node Features Matrix represents the sum of neighboring node features

In [18]:
A*H
Out[18]:
matrix([[-1],
        [ 1],
        [ 1],
        [ 1],
        [-1],
        [-1]])
In [19]:
A_self = A + np.eye(6)

print(A_self)
[[1. 1. 1. 1. 0. 0.]
 [1. 1. 1. 0. 0. 0.]
 [1. 1. 1. 0. 0. 0.]
 [1. 0. 0. 1. 1. 1.]
 [0. 0. 0. 1. 1. 1.]
 [0. 0. 0. 1. 1. 1.]]

Similar to data pre-processing for any Neural Networks operation, normalize the features to prevent numerical instabilities and vanishing / exploding gradients in order for the model to converge

In [20]:
D = np.array(A_self.sum(1))

print(D)
[[4.]
 [3.]
 [3.]
 [4.]
 [3.]
 [3.]]
In [29]:
D_half_norm = np.power(D, -0.5).flatten()
D_half_norm = np.diag(D_half_norm)

A_self = np.asmatrix(A_self)
D_half_norm = np.asmatrix(D_half_norm)

A_half_norm = D_half_norm*A_self*D_half_norm
print(A_half_norm)
[[0.25       0.28867513 0.28867513 0.25       0.         0.        ]
 [0.28867513 0.33333333 0.33333333 0.         0.         0.        ]
 [0.28867513 0.33333333 0.33333333 0.         0.         0.        ]
 [0.25       0.         0.         0.25       0.28867513 0.28867513]
 [0.         0.         0.         0.28867513 0.33333333 0.33333333]
 [0.         0.         0.         0.28867513 0.33333333 0.33333333]]
In [22]:
A_half_norm*H
Out[22]:
matrix([[ 0.        ],
        [ 0.28867513],
        [ 0.28867513],
        [ 0.        ],
        [-0.28867513],
        [-0.28867513]])

Build 2-layer GCN using ReLU as the activation function

In [28]:
np.random.seed(1)

W1 = np.random.randn(1, 4) # input: 1 -> hidden: 4
W2 = np.random.randn(4, 2) # hidden: 4 -> output: 2

def relu(x):
    return np.maximum(0, x)

def gcn(A, H, W):   
    D = np.array((A + np.eye(6)).sum(1))
    D_half_norm = np.diag((np.power(D, -0.5)).flatten())
    H_new = D_half_norm*A_self*D_half_norm*H*W
    return relu(H_new)

H1 = gcn(A, H, W1)
H2 = gcn(A, H1, W2)

print(H2)
[[0.23428681 0.        ]
 [0.27053111 0.        ]
 [0.27053111 0.        ]
 [0.46745053 0.        ]
 [0.53976538 0.        ]
 [0.53976538 0.        ]]

2.5. Readout: Permutation Invariance

  • Adjacency matrix can be different even though two graph has the same network structure

    • Even if the edge information between all nodes is the same, the order of values in the matrix may be different due to rotation and symmetry
  • Therefore, in graph-level representation,Readout layer makes this permutation invariant by multiplying MLP





  • Node-wise summation


$$ Z_G = \tau \left(\sum_{i \in G} \text{MLP} \left(H_i^{(L)} \right) \right) $$




2.6. Overall Structure of GCN





  • Graph information with feature matrix and adjacency matrix input to GCN
  • Graph Convolution Layer
    • Update information of each node according to their adjacency matrix





  • Collect all node information with MLP and determine a certain value for regression or classification in readout layer

2.7. Three Types of GNN Problem

  • Task 1: Node classification

  • Task 2: Edges prediction

  • Task 3: Graph classification





3. Lab 1: Node Classification using Graph Convolutional Networks

3.0. List of GNN Python Libraries

  • Deep Graph Library (DGL)
    • Based on PyTorch, TensorFlow or Apache MXNet.
  • Graph Nets
    • DeepMind’s library for building graph networks in Tensorflow and Sonnet
  • Spektral
    • Based on the Keras API and TensorFlow 2
    • We will use this one for demo
In [ ]:
# !pip install spektral==0.6.0
# !pip install tensorflow==2.2.0
# !pip install keras==2.3.0
In [1]:
import numpy as np
import networkx as nx
from tensorflow.keras.utils import to_categorical

from spektral.layers import GraphConv

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dropout
from tensorflow.keras.optimizers import Adam

import tensorflow as tf
from tensorflow.keras.regularizers import l2

from collections import Counter
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

3.1. Data Loading

Download data from here

In [2]:
classes = ['Case_Based', 
           'Genetic_Algorithms', 
           'Neural_Networks', 
           'Probabilistic_Methods', 
           'Reinforcement_Learning', 
           'Rule_Learning', 
           'Theory']
In [11]:
labels_encoded = np.load('./data_files/cora_labels_encoded.npy')
nodes = np.load('./data_files/cora_nodes.npy')
edge_list = np.load('./data_files/cora_edges.npy')
X = np.load('./data_files/cora_features.npy')
data_mask = np.load('./data_files/cora_mask.npy')

N = X.shape[0]
F = X.shape[1]

print('X shape: ', X.shape)
print('\nNumber of nodes (N): ', N)
print('\nNumber of features (F) of each node: ', F)
print('\nCategories: ', classes)

num_classes = len(classes)
print('\nNumber of classes: ', num_classes)
X shape:  (2708, 1433)

Number of nodes (N):  2708

Number of features (F) of each node:  1433

Categories:  ['Case_Based', 'Genetic_Algorithms', 'Neural_Networks', 'Probabilistic_Methods', 'Reinforcement_Learning', 'Rule_Learning', 'Theory']

Number of classes:  7





3.2 Train/Validation/Test Data Splitting

In case of GCN, it is possible to train with small number of data compared to supervised method because it is semi-supervised method. Therefore, in this task, train dataset will consist of 20 data for each class. Likewise, execute validation and test with 500 validation dataset and 1000 test dataset.

In [4]:
# Load index of node for train model
train_mask = data_mask[0]

# Load index of node for validate model
val_mask = data_mask[1]

# Load index of node for test model
test_mask = data_mask[2]
In [5]:
print("All Number of Node for Node Classification: ", len(labels))
print("\n")
print("Number of Trainig Data: ", np.sum(train_mask))
print("\n")
print("Number of Validation Data: ", np.sum(val_mask))
print("\n")
print("Number of Test Data: ", np.sum(test_mask))
All Number of Node for Node Classification:  2708


Number of Trainig Data:  140


Number of Validation Data:  500


Number of Test Data:  1000

3.3 Initializing Graph G

In [6]:
G = nx.Graph(name = 'Cora')
G.add_nodes_from(nodes)
G.add_edges_from(edge_list)

print('Graph info: ', nx.info(G))
Graph info:  Graph named 'Cora' with 2708 nodes and 5278 edges

3.4 Construct and Normalize Adjacency Matrix A

3.4.1 Insert self-loops to A

In [7]:
A = nx.adjacency_matrix(G)

I = np.eye(A.shape[-1], dtype=A.dtype)
A_self = A + I

3.4.2 Normalizing term , $D^{-1/2}AD^{-1/2}$

In [8]:
degree = np.array(A_self.sum(1))

D_half_norm = np.power(degree, -0.5).flatten()

D = np.diag(D_half_norm)

print('D:\n', D)

DAD = D * A_self * D
print('\nDAD:\n', DAD)

DAD = np.array(DAD, dtype = np.float32)
X = np.array(X, dtype = np.float32)
D:
 [[0.37796447 0.         0.         ... 0.         0.         0.        ]
 [0.         0.40824829 0.         ... 0.         0.         0.        ]
 [0.         0.         0.57735027 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.70710678 0.         0.        ]
 [0.         0.         0.         ... 0.         0.5        0.        ]
 [0.         0.         0.         ... 0.         0.         0.4472136 ]]

DAD:
 [[0.14285714 0.         0.         ... 0.         0.         0.        ]
 [0.         0.16666667 0.         ... 0.         0.         0.        ]
 [0.         0.         0.33333333 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.5        0.         0.        ]
 [0.         0.         0.         ... 0.         0.25       0.        ]
 [0.         0.         0.         ... 0.         0.         0.2       ]]

3.5 GCN Model

In [58]:
channels = 16
dropout = 0.5
l2_reg = 5e-4
learning_rate = 1e-2
epochs = 100
es_patience = 10

X_in = Input(shape = (F, ))
fltr_in = Input((N, ), sparse = True)

dropout_1 = Dropout(dropout)(X_in)
graph_conv_1 = GraphConv(channels,
                         activation = 'relu',
                         kernel_regularizer = l2(l2_reg),
                         use_bias = False)([dropout_1, fltr_in])

dropout_2 = Dropout(dropout)(graph_conv_1)
graph_conv_2 = GraphConv(num_classes,
                         activation = 'softmax',
                         use_bias = False)([dropout_2, fltr_in])

model = Model(inputs = [X_in, fltr_in], outputs = graph_conv_2)
optimizer = Adam(lr = learning_rate)
model.compile(optimizer = optimizer,
              loss = 'categorical_crossentropy',
              weighted_metrics = ['acc'])

model.summary()
Model: "model_16"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_31 (InputLayer)           [(None, 1433)]       0                                            
__________________________________________________________________________________________________
dropout_24 (Dropout)            (None, 1433)         0           input_31[0][0]                   
__________________________________________________________________________________________________
input_32 (InputLayer)           [(None, 2708)]       0                                            
__________________________________________________________________________________________________
graph_conv_30 (GraphConv)       (None, 16)           22928       dropout_24[0][0]                 
                                                                 input_32[0][0]                   
__________________________________________________________________________________________________
dropout_25 (Dropout)            (None, 16)           0           graph_conv_30[0][0]              
__________________________________________________________________________________________________
graph_conv_31 (GraphConv)       (None, 7)            112         dropout_25[0][0]                 
                                                                 input_32[0][0]                   
==================================================================================================
Total params: 23,040
Trainable params: 23,040
Non-trainable params: 0
__________________________________________________________________________________________________

3.6 Train Model

In [60]:
validation_data = ([X, DAD], labels_encoded, val_mask)

model.fit([X, DAD],
          labels_encoded,
          sample_weight = train_mask,
          epochs = epochs,
          batch_size = N,
          validation_data = validation_data,
          shuffle = False,)
Epoch 1/100
1/1 [==============================] - 0s 241ms/step - loss: 0.1165 - acc: 0.1214 - val_loss: 0.3649 - val_acc: 0.2240
Epoch 2/100
1/1 [==============================] - 0s 183ms/step - loss: 0.1095 - acc: 0.2571 - val_loss: 0.3535 - val_acc: 0.3520
Epoch 3/100
1/1 [==============================] - 0s 179ms/step - loss: 0.1028 - acc: 0.4214 - val_loss: 0.3401 - val_acc: 0.3700
Epoch 4/100
1/1 [==============================] - 0s 175ms/step - loss: 0.0963 - acc: 0.5143 - val_loss: 0.3264 - val_acc: 0.4100
Epoch 5/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0911 - acc: 0.5000 - val_loss: 0.3143 - val_acc: 0.4920
Epoch 6/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0861 - acc: 0.6071 - val_loss: 0.3041 - val_acc: 0.5400
Epoch 7/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0822 - acc: 0.6786 - val_loss: 0.2950 - val_acc: 0.5960
Epoch 8/100
1/1 [==============================] - 0s 187ms/step - loss: 0.0801 - acc: 0.7000 - val_loss: 0.2865 - val_acc: 0.6420
Epoch 9/100
1/1 [==============================] - 0s 182ms/step - loss: 0.0773 - acc: 0.7500 - val_loss: 0.2788 - val_acc: 0.6860
Epoch 10/100
1/1 [==============================] - 0s 176ms/step - loss: 0.0739 - acc: 0.7929 - val_loss: 0.2718 - val_acc: 0.6940
Epoch 11/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0729 - acc: 0.8357 - val_loss: 0.2650 - val_acc: 0.7160
Epoch 12/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0692 - acc: 0.8571 - val_loss: 0.2586 - val_acc: 0.7320
Epoch 13/100
1/1 [==============================] - 0s 176ms/step - loss: 0.0692 - acc: 0.8929 - val_loss: 0.2521 - val_acc: 0.7380
Epoch 14/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0663 - acc: 0.9000 - val_loss: 0.2464 - val_acc: 0.7460
Epoch 15/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0685 - acc: 0.8429 - val_loss: 0.2410 - val_acc: 0.7460
Epoch 16/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0618 - acc: 0.8714 - val_loss: 0.2358 - val_acc: 0.7540
Epoch 17/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0613 - acc: 0.8929 - val_loss: 0.2307 - val_acc: 0.7620
Epoch 18/100
1/1 [==============================] - 0s 184ms/step - loss: 0.0598 - acc: 0.9071 - val_loss: 0.2255 - val_acc: 0.7660
Epoch 19/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0572 - acc: 0.9643 - val_loss: 0.2205 - val_acc: 0.7640
Epoch 20/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0575 - acc: 0.9000 - val_loss: 0.2156 - val_acc: 0.7660
Epoch 21/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0587 - acc: 0.9286 - val_loss: 0.2112 - val_acc: 0.7780
Epoch 22/100
1/1 [==============================] - 0s 183ms/step - loss: 0.0542 - acc: 0.9357 - val_loss: 0.2076 - val_acc: 0.7760
Epoch 23/100
1/1 [==============================] - 0s 187ms/step - loss: 0.0552 - acc: 0.9357 - val_loss: 0.2046 - val_acc: 0.7800
Epoch 24/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0533 - acc: 0.9357 - val_loss: 0.2023 - val_acc: 0.7760
Epoch 25/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0504 - acc: 0.9500 - val_loss: 0.2001 - val_acc: 0.7760
Epoch 26/100
1/1 [==============================] - 0s 176ms/step - loss: 0.0499 - acc: 0.9571 - val_loss: 0.1979 - val_acc: 0.7780
Epoch 27/100
1/1 [==============================] - 0s 181ms/step - loss: 0.0537 - acc: 0.9143 - val_loss: 0.1964 - val_acc: 0.7700
Epoch 28/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0495 - acc: 0.9429 - val_loss: 0.1946 - val_acc: 0.7720
Epoch 29/100
1/1 [==============================] - 0s 181ms/step - loss: 0.0494 - acc: 0.9357 - val_loss: 0.1928 - val_acc: 0.7760
Epoch 30/100
1/1 [==============================] - 0s 181ms/step - loss: 0.0481 - acc: 0.9643 - val_loss: 0.1907 - val_acc: 0.7780
Epoch 31/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0456 - acc: 0.9429 - val_loss: 0.1890 - val_acc: 0.7780
Epoch 32/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0466 - acc: 0.9429 - val_loss: 0.1876 - val_acc: 0.7820
Epoch 33/100
1/1 [==============================] - 0s 182ms/step - loss: 0.0455 - acc: 0.9429 - val_loss: 0.1859 - val_acc: 0.7860
Epoch 34/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0457 - acc: 0.9429 - val_loss: 0.1843 - val_acc: 0.7840
Epoch 35/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0441 - acc: 0.9357 - val_loss: 0.1826 - val_acc: 0.7840
Epoch 36/100
1/1 [==============================] - 0s 182ms/step - loss: 0.0461 - acc: 0.9500 - val_loss: 0.1811 - val_acc: 0.7780
Epoch 37/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0461 - acc: 0.9214 - val_loss: 0.1795 - val_acc: 0.7840
Epoch 38/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0454 - acc: 0.9214 - val_loss: 0.1780 - val_acc: 0.7820
Epoch 39/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0413 - acc: 0.9786 - val_loss: 0.1763 - val_acc: 0.7820
Epoch 40/100
1/1 [==============================] - 0s 176ms/step - loss: 0.0438 - acc: 0.9429 - val_loss: 0.1747 - val_acc: 0.7820
Epoch 41/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0406 - acc: 0.9500 - val_loss: 0.1732 - val_acc: 0.7800
Epoch 42/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0407 - acc: 0.9214 - val_loss: 0.1720 - val_acc: 0.7800
Epoch 43/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0387 - acc: 0.9643 - val_loss: 0.1712 - val_acc: 0.7800
Epoch 44/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0403 - acc: 0.9429 - val_loss: 0.1704 - val_acc: 0.7800
Epoch 45/100
1/1 [==============================] - 0s 184ms/step - loss: 0.0359 - acc: 0.9714 - val_loss: 0.1702 - val_acc: 0.7760
Epoch 46/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0388 - acc: 0.9714 - val_loss: 0.1699 - val_acc: 0.7740
Epoch 47/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0406 - acc: 0.9500 - val_loss: 0.1692 - val_acc: 0.7720
Epoch 48/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0388 - acc: 0.9429 - val_loss: 0.1680 - val_acc: 0.7680
Epoch 49/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0405 - acc: 0.9643 - val_loss: 0.1668 - val_acc: 0.7740
Epoch 50/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0385 - acc: 0.9429 - val_loss: 0.1649 - val_acc: 0.7760
Epoch 51/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0386 - acc: 0.9357 - val_loss: 0.1633 - val_acc: 0.7920
Epoch 52/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0365 - acc: 0.9643 - val_loss: 0.1620 - val_acc: 0.7940
Epoch 53/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0400 - acc: 0.9286 - val_loss: 0.1613 - val_acc: 0.7960
Epoch 54/100
1/1 [==============================] - 0s 182ms/step - loss: 0.0377 - acc: 0.9500 - val_loss: 0.1610 - val_acc: 0.7940
Epoch 55/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0362 - acc: 0.9357 - val_loss: 0.1602 - val_acc: 0.7880
Epoch 56/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0375 - acc: 0.9429 - val_loss: 0.1598 - val_acc: 0.7900
Epoch 57/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0366 - acc: 0.9643 - val_loss: 0.1596 - val_acc: 0.7880
Epoch 58/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0420 - acc: 0.9429 - val_loss: 0.1598 - val_acc: 0.7880
Epoch 59/100
1/1 [==============================] - 0s 182ms/step - loss: 0.0375 - acc: 0.9571 - val_loss: 0.1602 - val_acc: 0.7880
Epoch 60/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0349 - acc: 0.9500 - val_loss: 0.1608 - val_acc: 0.7880
Epoch 61/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0367 - acc: 0.9357 - val_loss: 0.1611 - val_acc: 0.7900
Epoch 62/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0347 - acc: 0.9786 - val_loss: 0.1609 - val_acc: 0.7940
Epoch 63/100
1/1 [==============================] - 0s 182ms/step - loss: 0.0348 - acc: 0.9571 - val_loss: 0.1600 - val_acc: 0.7940
Epoch 64/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0345 - acc: 0.9786 - val_loss: 0.1587 - val_acc: 0.7940
Epoch 65/100
1/1 [==============================] - 0s 176ms/step - loss: 0.0358 - acc: 0.9429 - val_loss: 0.1575 - val_acc: 0.7900
Epoch 66/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0347 - acc: 0.9571 - val_loss: 0.1568 - val_acc: 0.7900
Epoch 67/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0338 - acc: 0.9500 - val_loss: 0.1566 - val_acc: 0.7800
Epoch 68/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0356 - acc: 0.9714 - val_loss: 0.1572 - val_acc: 0.7740
Epoch 69/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0349 - acc: 0.9357 - val_loss: 0.1583 - val_acc: 0.7680
Epoch 70/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0343 - acc: 0.9571 - val_loss: 0.1598 - val_acc: 0.7660
Epoch 71/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0317 - acc: 0.9714 - val_loss: 0.1606 - val_acc: 0.7700
Epoch 72/100
1/1 [==============================] - 0s 182ms/step - loss: 0.0335 - acc: 0.9643 - val_loss: 0.1597 - val_acc: 0.7740
Epoch 73/100
1/1 [==============================] - 0s 181ms/step - loss: 0.0372 - acc: 0.9500 - val_loss: 0.1575 - val_acc: 0.7800
Epoch 74/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0308 - acc: 0.9643 - val_loss: 0.1561 - val_acc: 0.7900
Epoch 75/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0327 - acc: 0.9714 - val_loss: 0.1545 - val_acc: 0.7920
Epoch 76/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0317 - acc: 0.9786 - val_loss: 0.1534 - val_acc: 0.7980
Epoch 77/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0320 - acc: 0.9786 - val_loss: 0.1522 - val_acc: 0.7940
Epoch 78/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0325 - acc: 0.9643 - val_loss: 0.1512 - val_acc: 0.7960
Epoch 79/100
1/1 [==============================] - 0s 181ms/step - loss: 0.0345 - acc: 0.9500 - val_loss: 0.1513 - val_acc: 0.7860
Epoch 80/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0323 - acc: 0.9571 - val_loss: 0.1516 - val_acc: 0.7840
Epoch 81/100
1/1 [==============================] - 0s 182ms/step - loss: 0.0361 - acc: 0.9214 - val_loss: 0.1516 - val_acc: 0.7860
Epoch 82/100
1/1 [==============================] - 0s 176ms/step - loss: 0.0322 - acc: 0.9643 - val_loss: 0.1507 - val_acc: 0.7860
Epoch 83/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0290 - acc: 0.9857 - val_loss: 0.1497 - val_acc: 0.7880
Epoch 84/100
1/1 [==============================] - 0s 181ms/step - loss: 0.0315 - acc: 0.9500 - val_loss: 0.1496 - val_acc: 0.7900
Epoch 85/100
1/1 [==============================] - 0s 187ms/step - loss: 0.0306 - acc: 0.9500 - val_loss: 0.1497 - val_acc: 0.7940
Epoch 86/100
1/1 [==============================] - 0s 186ms/step - loss: 0.0300 - acc: 0.9786 - val_loss: 0.1503 - val_acc: 0.7940
Epoch 87/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0307 - acc: 0.9714 - val_loss: 0.1499 - val_acc: 0.8040
Epoch 88/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0327 - acc: 0.9714 - val_loss: 0.1489 - val_acc: 0.8000
Epoch 89/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0318 - acc: 0.9571 - val_loss: 0.1487 - val_acc: 0.8040
Epoch 90/100
1/1 [==============================] - 0s 183ms/step - loss: 0.0312 - acc: 0.9571 - val_loss: 0.1490 - val_acc: 0.7980
Epoch 91/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0315 - acc: 0.9286 - val_loss: 0.1491 - val_acc: 0.7980
Epoch 92/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0306 - acc: 0.9857 - val_loss: 0.1498 - val_acc: 0.7880
Epoch 93/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0303 - acc: 0.9571 - val_loss: 0.1506 - val_acc: 0.7880
Epoch 94/100
1/1 [==============================] - 0s 187ms/step - loss: 0.0314 - acc: 0.9643 - val_loss: 0.1523 - val_acc: 0.7800
Epoch 95/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0282 - acc: 0.9929 - val_loss: 0.1525 - val_acc: 0.7700
Epoch 96/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0297 - acc: 0.9643 - val_loss: 0.1532 - val_acc: 0.7700
Epoch 97/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0319 - acc: 0.9571 - val_loss: 0.1531 - val_acc: 0.7680
Epoch 98/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0313 - acc: 0.9643 - val_loss: 0.1521 - val_acc: 0.7740
Epoch 99/100
1/1 [==============================] - 0s 185ms/step - loss: 0.0288 - acc: 0.9786 - val_loss: 0.1498 - val_acc: 0.7720
Epoch 100/100
1/1 [==============================] - 0s 182ms/step - loss: 0.0294 - acc: 0.9643 - val_loss: 0.1475 - val_acc: 0.7780
Out[60]:
<tensorflow.python.keras.callbacks.History at 0x27d4d132048>

3.7 Model Evaluation

In [42]:
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt

X_te = X[test_mask]
A_te = DAD[test_mask,:][:,test_mask]
y_te = labels_encoded[test_mask]

y_pred = model.predict([X_te, A_te], batch_size = N)

cm = confusion_matrix(np.argmax(y_te, axis = 1), np.argmax(y_pred, axis = 1))
print('Confusion matrix')
print(cm)

# plot_confusion_matrix(np.argmax(y_te, axis=1), np.argmax(y_pred, axis=1), classes, fontsize=15)
Confusion matrix
[[ 86   1   8   0   7   9   3]
 [  2 135   8   1   3   4   3]
 [  9   5 205  24  11   8  28]
 [  5   1  28 113   6   7  12]
 [  7   4   2   1  66   0   5]
 [  5   2   0   0   0  49   4]
 [ 12   4   8   3   4  11  81]]
In [43]:
plt.figure()
plt.imshow(cm, interpolation = 'nearest', cmap = plt.cm.Blues)

fmt = 'd'
thresh = cm.max() / 2.
for i in range(cm.shape[0]):
    for j in range(cm.shape[1]):
        plt.text(j, i, format(cm[i, j], fmt), ha = "center", va = "center", color = "white" if cm[i, j] > thresh else "black", fontsize = 15)

plt.colorbar()
plt.tight_layout()
plt.show()

3.8 T-SNE

In [15]:
layer_outputs = [layer.output for layer in model.layers]
activation_model = Model(inputs = model.input, outputs = layer_outputs)
activations = activation_model.predict([X,DAD],batch_size = N)

x_tsne = TSNE(n_components = 2).fit_transform(activations[3]) 
In [16]:
def plot_tSNE(labels_encoded,x_tsne):
    color_map = np.argmax(labels_encoded, axis = 1)
    plt.figure(figsize = (10,10))
    for cl in range(num_classes):
        indices = np.where(color_map == cl)
        indices = indices[0]
        plt.scatter(x_tsne[indices,0], x_tsne[indices, 1], label = cl)
    plt.legend()
    plt.show()
    
plot_tSNE(labels_encoded,x_tsne)

4. Useful Resources for Further Study

In [1]:
%%html 
<center><iframe src="https://www.youtube.com/embed/fOctJB4kVlM?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [2]:
%%html 
<center><iframe src="https://www.youtube.com/embed/ABCGCf8cJOE?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [3]:
%%html 
<center><iframe src="https://www.youtube.com/embed/0YLZXjMHA-8?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [4]:
%%html 
<center><iframe src="https://www.youtube.com/embed/ex2qllcVneY?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [5]:
%%html 
<center><iframe src="https://www.youtube.com/embed/YL1jGgcY78U?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [6]:
%%html 
<center><iframe src="https://www.youtube.com/embed/8owQBFAHw7E?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [7]:
%%html 
<center><iframe src="https://www.youtube.com/embed/R67-JxtOQzg?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')